Regression Analysis in Small-n-Large-p Using Interactive Prior Elicitation of Pairwise Similarities

نویسندگان

  • Homayun Afrabandpey
  • Tomi Peltola
  • Samuel Kaski
چکیده

In this extended abstract we introduce a new method for eliciting experts, prior knowledge about the similarity of the roles of features in the prediction task. The key idea is to use an interactive multidimensional-scaling-type scatterplot display of the features to elicit the similarity relationships, and then use the elicited relationships in the prior distribution of prediction parameters. Specifically, for learning to predict a target variable with Bayesian linear regression, the feature relationships are used as prior for the correlations of the regression coefficients. Simulation results together with a preliminary real user study on text data confirm that prior elicitation of feature similarities improves prediction accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Clustering using Randomly Selected Similarities

The problem of hierarchical clustering items from pairwise similarities is found across various scientific disciplines, from biology to networking. Often, applications of clustering techniques are limited by the cost of obtaining similarities between pairs of items. While prior work has been developed to reconstruct clustering using a significantly reduced set of pairwise similarities via adapt...

متن کامل

Regression with n→1 by Expert Knowledge Elicitation

We consider regression under the “extremely small n large p” condition. In particular, we focus on problems with so small sample sizes n compared to the dimensionality p, even n → 1, that predictors cannot be estimated without prior knowledge. Furthermore, we assume all prior knowledge that can be automatically extracted from databases has already been taken into account. This setup occurs in p...

متن کامل

Kernel machine methods for integrative analysis of genome-wide methylation and genotyping studies.

Many large GWAS consortia are expanding to simultaneously examine the joint role of DNA methylation in addition to genotype in the same subjects. However, integrating information from both data types is challenging. In this paper, we propose a composite kernel machine regression model to test the joint epigenetic and genetic effect. Our approach works at the gene level, which allows for a commo...

متن کامل

Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities

Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N...

متن کامل

Probabilistic Expert Knowledge Elicitation of Feature Relevances in Sparse Linear Regression

In this extended abstract1, we consider the “small n, large p” prediction problem, where the number of available samples n is much smaller compared to the number of covariates p. This challenging setting is common for multiple applications, such as precision medicine, where obtaining additional samples can be extremely costly or even impossible. Extensive research effort has recently been dedic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016